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3.8 Efficiency of maximum likelihood estimators. Let K » M for m x m matrices 
K, M mean that K — M is nonnegative definite. Let T n be a sequence of estimators such 
that the distribution of \/n(T n — 9) under Pr# is asymptotically N(0,v(9)). By Theorem 
3.7.11, under its assumptions, v(8) » I(9)~ 1 for Lebesgue almost all 9. Thus, the 
sequence {T n } will be called "efficient" if for all 6, under Pr<9, y/n(T n — 9) is asymptotically 
N(0, v(9)) with I{9)~ 1 » v(9). In practice, efficient estimators will have v(9) = I(9)~ 1 for 
all 9. The definition allows for superefficiency for some set of 9 which, under the conditions 
of Sec. 3.7, will have Lebesgue measure 0. The efficiency of maximum likelihood estimators 
with v(9) = I(9)~ 1 will be proved under the following assumptions. 

(EML-1) {Pg, 9 G 0} is an equivalent family of laws on a sample space (X, B) having 
densities f(9, •) > with respect to a a-finite measure fx, where © is an open subset of a 
Euclidean space IR m . The observations Xi,X 2 , ■ ■ ■ , are i.i.d. (Pe ) for some 9q G 6. 

Let L(9,x) := log and ij;(9,x) := VgL(9,x) where V# denotes gradient with 
respect to 9. 

(EML-2) For each x G X, /(•, x) is C 1 with respect to 6, and the Fisher information matrix 
/(•) exists on O and is continuous and non-singular at 9q. 

If Eq{\? qL{9,x) = 0, which will be proved in Theorem 3.8.1 to follow from the given 
assumptions, then 1(9) is the covariance matrix C of ip(9,x). 

(EML-3) {T n } is a sequence of maximum likelihood estimators and is consistent, in other 
words T n — ► 9 in Prg-probability as n — > oo for all 9. 

Conditions for consistency of M-estimators were given in Sections 3.3 and 3.5. 

Conditions (AN-4) and (AN-5)(ii) in Section 3.6 will be assumed, locally uniformly in 
6*o- Specifically, recall that for 5 > small enough, depending on 9, 

u(9,x,8) := sup{|V>(77, x) — ip(6,x)\ : \rj — 9\ < 5}. 

(EML-4) (i) For each 9,0 G 0, X^(9) := E^(9, x) exists in R m . Let A(-) := X 9o (-). 
(ii) For some numbers b > and 7 > 0, and some neighborhood U of 6>o, for all 
0, 9 G U, 1 77 — 0| < 7 implies i)6 0, and mBx.(Egu(4>, x, 5),Eg[u((f), x, 5) 2 ]) < b5 for 
any 5 such that < 5 < 7/2. 

(EML-5) For some neighborhood V of 9 , sup 0eV E e \ift(9, x) | 2 < 00. 

As in Theorem 3.6.15, let A be the Frechet derivative of A(-) at 9o if it exists. 

3.8.1 Theorem. Assume (EML-1) through (EML-5). Then A(0 O ) = and A exists with 
A = —I(9o). Also, the distribution of ^/n(T n — 9q) converges to iV(0, 1(9 )~ 1 ) as n — > 00. 

Proof. Take 6, 7 > such that (EML-4) (ii) holds and such that \<f> — 9o\ < 7 implies 
G U n V for V in (EML-5). For 9 ^ 9 with \9 - 9 \ < 7/2 and < t < 1, let 
^ := ^ + £(0 - O ). Then for each x, by (EML-2), 

L{9,x)-L{9 ,x) = [ ij(9 u x)dt-(9-9 ). 
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By Theorem 3.3.15 (about Kullback-Leibler divergence), 

o > J[L(e,x)-L(e ,x)]f(e ,x)dn(x) = 

[ [ il>(O t ,x)dt f(0 o ,x)dfji(x)-(0-0o) = [ \(O t )dt ■ (9 - e ) 
J Jo Jo 

where the interchange of integrals is justified since by (EML-4)(ii) for <p = 9q, \ij;(O t ,x)\ < 
|^(^0) x )\ +u(0q, x, 7/2), an integrable function for Pg . Since A(-) is continuous on U, also 
by (EML-4) (ii) , 



/ \(6 tu )dt -> A(6» ) as u j 0, 
Jo 



and > /o 1 \{9 tu )dt ■ (9 U - 9 )/u = ft \{9 tu )dt ■ (9 - 9 ). So A(0 O ) • {6 - 9 ) < for any 
9 in a neighborhood of 6>o, which implies A(6>o) = 0. Also, by the same argument applied 
to (p such that \<f> — 9q\ < 7/2 in place of 6>o, / ip((f>,x)f((f),x)diJ,(x) = for all e U. For 
(column) vectors 77, C £ K^N f/C = ^'C e ^ an d t/C' = is the mxm matrix {^iC/}i"/=i- 
Next, for \9 - 9 \ < 7/2, 



X(9) 



-X(9 ) = X(9) = Jl>(0,x)f(6 ,x)dn{ 



x) 



- / iP(9,x)[f(9,x)-f(9o,x)]d»(x) 



= - J 4,(6, x) J ^(9 u x)f(9 u x)dt-(9-9 ) 
= - J i;(9,x) J ij(9 t ,x)f(9 t ,x)dt 



dfx(x) 



Now, 



4>(6 t ,x)f(6 t ,x)dt 



dfi(x)(9-9 ). 



dfi(x) 



= [ [ ^{9 u x)^{9 u x)'f{9 u x)dtdii{x) + r{9) = [ I{9 t )dt + r{9) 
J Jo Jo 

where, interchanging integrals by the Tonelli-Fubini theorem, 

\r(9)\ < J Ju(9 u x,\9-9 \)m9 u x)\f(9 u x)dfi(x)dt < 0(\9 - 9 \ 1/2 ) 

by the Cauchy-Bunyakovsky-Schwarz inequality applied to the two functions gt(9,x) := 
u(9 u x, \9-9 \)f(9 u x) 1 / 2 using (EML-4) (ii) for cf> = 9 t and h t (9,x) := \iP{9 u x)\f{9 u x) 1 / 2 
using (EML-5). Since /(•) is continuous at 6>o, it follows that A = —I(0o) as stated. Recall 
that the covariance C of ip(9o,x) is I(9q). 



Next, we need to check the hypotheses of Section 3.6. (AN-1) follows from (EML-1) 
and (EML-3). In (AN-2), measurability of ip(9, •) follows from that of f(9, •) as a density, 
and the fact that the components of the gradient of the measurable function £(•,•) with 
respect to 9, which exist by (EML-2), are measurable as limits of a sequence of measurable 
functions along sequences 4>k = 9 + (l/k)ei as k — > oo where is one of the m standard 
unit vectors. Separability of ip follows from its continuity with respect to 9, (EML-2), since 
/(•, •) > (EML-1). In (AN-3), existence of X(9) is assumed in (EML-4)(i), and A(0 O ) = 
has been proved. (AN-4)(i) follows from A = —I(9o), as proved, and the fact that I(9q) 
is non-singular (EML-2). (AN-4)(ii) follows from (EML-4) (ii) and (AN-5) from (EML-5). 
So we have all the hypotheses (AN-1) through (AN-5). By Theorem 3.6.15, recalling that 
in this section ifj(9,x) = \7gL(9,x) with covariance 1(9$) at 9 = 6>o, the distribution of 
\/n(T n — 9q) converges to A^(0, i"(6>o) _1 ), proving the theorem. □ 

It can be interesting to investigate the possibility that the assumption in (EML-2) 
that f(-jx) be C 1 in 9 might be weakened. Huber (1967) proposed that the derivative of 
L(-, •) with respect to 9 need only exist "in measure," not necessarily at all x or 9, one 
possible interpretation of which is: for each 6>, there is a vector-valued function ifj(9,x) 
such that for each <fi e M m , 

(3.8.2) \im[L(9 + t(j),x) - L(9,x)]/t = • if)(9,x), 

where the convergence is in probability with respect to x, and 9 + t<p G © for t small 
enough. Consider the following 

Example. Let X = O be the open interval (0, 1) cR and let 

f(9,x) := (1 + 9)- 1 [l + l m (x)] 

with respect to Lebesgue measure. Since 29 + (1 — 9) = 1 + 9 this does give probability 
densities. We have 

L(9,x) = -log(l + (9) + (log2)l ( o )fl] (x), 

and dL{9,x)/d9 = —1/(1 + 9) for x ^ 9, so this is the derivative in probability ip(9,x) by 
the definition (3.8.2). Strangely, it does not depend on x. Thus A(6>) = —1/(1 + 9) also. 
For n i.i.d. observations X\, . . . ,X n , 1/n times the log likelihood is 

1 n 

L n (9,{X j }^ 1 ) := = -log(l + 0) + (log2)F B (0), 

3=1 

where F n is the empirical distribution function based on Xi, . . . ,X n . Let the true pa- 
rameter 6*o = 4> for some 4> G (0,1)- We know by the Glivenko-Cantelli theorem (RAP, 
Theorem 11.4.2) that almost surely F n (t) converges to the true distribution function F(t) 
uniformly in t. To find a maximum likelihood estimate of 9, we need approximately to 
maximize rj(9) := — log(l + 9) + (log 2)F(6>). The derivative of this with respect to 9 is 

v '(9) = _(i + 0)-i + (iog2)(i + ^- 1 [i + i ( o^W] 
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for 9 7^ 0. The right term is piecewise constant in 6>, and the derivative of — (1 + 6>) _1 is 
(1 + 9)~ 2 > 0. It follows that r\ equals the convex function — log(l + 9) plus a piecewise 
linear function, so it is convex on each interval [0,0] and [0, 1]. Clearly 77(0) = = 0. 
We have ??(0) = — log(l + 0) + (log 2)(20)/(l + 0). To show that this is strictly positive 
for < < 1 we want to show that (1 + 0)log(l + 0) < (2 log 2)0. Both sides are 
at and equal 2 log 2 at 1. The left side is strictly convex since its second derivative is 
1/(1 + 0) > 0, and the right side is linear, so it's true that ?y(0) > for < < 1. At 
9 = 0, the left and right derivatives of r\ satisfy ^'(0— ) > 0, ?/(0+) < 0. Thus 9 = gives 
a local maximum of rj. By the convexity on [0, 0] and [0, 1] and since t](0) = r](l) = 0, 
9 = gives the unique global maximum of rj(-). 

Since F n is a right-continuous step function and increases at its jumps, maximum 
likelihood estimators will exist for all n and each equals one of Xi, . . . , X n . Almost surely 
the values of the log likelihood at the Xj are all different, so the MLE is unique. By the 
Glivenko-Cantelli theorem, the maximum likelihood estimators will be consistent (converge 
to 0). Thus (EML-3) holds. It is not hard to verify that (EML-1), (EML-4), and (EML-5) 
all hold and that the Fisher information 1(9) exists and is continuous and non-zero. Thus 
all of (EML-1) through (EML-5) hold except that in (EML-2), the C 1 condition has been 
weakened to differentiability in probability. The given proof of Theorem 3.8.1 doesn't work 
in this case, since in the first display, L(6, x) — L(9o, x), which does depend on x, can't be 
equal to an integral of ifj which doesn't depend on x. 

NOTE 

The proof on efficiency of the maximum likelihood estimator is from Huber (1967). 
Assumption (EML-2) is strengthened to make the proof work. 
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